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The purpose of this paper is to extend the investigation of Poisson-type deviation inequalities 
started by Jouhn {Bernoulli 13 (2007) 782-798) to the empirical mean of positively curved 
Markov jump processes. In particular, our main result generalizes the tail estimates given by 
Lezaud {Ann. Appl. Probab. 8 (1998) 849-867, ESAIM Probab. Statist. 5 (2001) 183-201). An 
application to birth-death processes completes this work. 
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1. Introduction 

Let {Xt)t>o be an ergodic Markov process on a Polish state space X, with stationary 
distribution tt. The well-known ergodic theorem asserts that for any integrable function 
(j) S L-^{n), the empirical mean (j){Xs)ds converges in probability to the average 

7r((/)) := J^cj)dn as t goes to infinity. Although large deviations theory gives the speed 
of convergence at infinity, such an asymptotic bound is unsatisfactory when one wants 
to estimate the minimum time to run the simulation algorithm in order to achieve a 
prescribed level of accuracy. Actually, the problem of finding non-asymptotic estimates 
has been raised and addressed by several authors. Using the Lumer-Philips theorem 
for a general Markov process {Xt)t>a, Wu (2000) derived an exponential decay on the 
deviation probability 



0(X,)ds-7r(0) 







>y], 2/>o, (1.1) 



available for any fixed time t. Although Wu's estimate is sharp in large time, such an up- 
per bound is not explicit in the parameter y. More recently, this result has been extended 
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in the diffusion framework by various authors who obtained quahtative upper bounds on 
(1.1), provided the stationary distribution tt satisfies some functional inequahties such 
as Poincare, log-Sobolev or transportation-type inequahties; see, for instance, the recent 
articles of Cattiaux and Guillin (2008), Djellout et al. (2004), Gourcy and Wu (2006) or 
Guillin et al. (2009). However, the functional inequalities approach does not seem to be 
relevant for Markov jump processes because this theory is not yet well developed for dis- 
crete gradients. To the author's knowledge, the problem of determining non-asymptotic 
upper bounds on the deviation probability (1.1) in this context has been investigated by 
few authors. For instance, under a spectral gap assumption and using Kato's perturba- 
tion theory for linear operators. Lezaud (1998, 2001) established Poisson-type deviation 
bounds, that is, upper bounds of the order c^**''°s(j') for large y, provided the function (p 
and the generator of the process are bounded. On the other hand, in the case of birth- 
death processes admitting a so-called Lipschitz spectral gap, Liu and Ma (2009) recently 
extended such tail estimates to Lipschitz functions (j) by using martingale techniques and 
convex concentration inequalities. 

The purpose of this paper is to present a new Poisson-type upper bound for the devia- 
tion probabihty (1.1) for a general Markov jump process {Xt)t>o- Our approach relies on 
the notion of Wasserstein curvature recently investigated by Joulin (2007) , where several 
tail estimates were obtained for the random variable (j){^t)- Hence we extend in this 
article our previous work to the path-dependent integral t~^ (f'i^s) ds. In essence, the 
Wasserstein curvature characterizes a contraction property of the associated semigroup 
on the space of probability measures on X, endowed with a suitable Wasserstein distance. 
Since the positively curved case is closely related to the speed of ergodicity of the process, 
we expect to obtain under this assumption a convenient upper bound on (1.1) in large 
time. 

The paper is organized as follows: in Section 2, we recall the definition of the Wasser- 
stein curvature of a Markov jump process {Xt)t>o- Next, we state the main contribution 
of the paper, Theorem 2.6, in which a Poisson-type deviation bound is established in the 
positively curved case for the empirical mean ds, where (p is only Lipschitz. 

Hence we extend the tail estimates given in the bounded case by Lezaud (1998, 2001). 
Section 3 is devoted to the proof of Theorem 2.6, which is rather technical and divided 
into several lemmas. The key point of the proof corresponds to Lemma 3.2, with the 
tensorization of a Laplace transform. Section 4 is devoted to the case of birth-death pro- 
cesses. More precisely, we compute the explicit expression of the Wasserstein curvature 
with respect to a large class of metrics on N. In particular, by choosing a convenient 
metric related to the transition rates of the associated generator, we are able to apply 
our deviation inequality to birth-death processes with non-necessarily bounded generator 
such as the classical M/M/oo queueing process. 

2. Preliminaries and main result 

Throughout the paper. A" is a Polish space endowed with a metric d, the space ^{X) 
consists of bounded measurable functions on X equipped with the supremum norm 
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ll/lloo = ^^"Px^x 1/(^)1 ^^"i Lip^(A') is the space of Lipschitz functions on X with a Lip- 
schitz seminorm defined by 

On a filtered probabihty space (il, (^()t>o, P), let {{Xt)t>o,(^x)x£x} be an X- 
valued cadlag Markov jump process with a generator given for any function / G S§{X) 

by 



Cf{x)^ / {f{y)-f{x))Q{xAy). x€X. 

JX 

Here the transition kernel Q is assumed to be stable and conservative: for any x ^ X and 
any Borel set A, 

Q{x, X) < +00, hm Pti^^^)-'^A{x) ^ ^^^^ _ ^^^^ X)Ia{x), 

where Pt{x,dy) -.^FxiXt G dy) denotes the transition probability of the process. Let 
{Pt)t>Q be the associated Markov semigroup acting on the space ^{X) as follows: 

Ptf{x):^Ex[fiXt)]= f f{y)Pt{x,dy), xeX. 

JX 

Denote by ^di^) the space of probability measures fi on X such that d{x,y)fi{dy) < 
+00 for some (or equivalently for all) x G X. If the Markov kernel Pt{x,-) € 3^d{X) for 
any t > and any x € X , then the semigroup is well defined on the space Lip^(A') and 
we introduce in this case the function 

a4t) :== -sup{log||Pt/||Lip, : ||/||Lip, = 1}, t> 0, 

with ad{0) = 0. By the Markov property, the function ad is super- additive so that the 
following limit is well defined: 

^d{t) . adit) 
(Td :=fim— — = mf — — . (2.1) 

tio t t>o t 

In particular, the number ad is the best (maximal) constant a in the contraction inequal- 
ity 

||Pt/||Lip,<c-"*||/||Lip,, feUpdiX), t>0. (2.2) 

Let us recall the definition of Wasserstein curvature of the Markov jump process {Xt)t>Q 
given by Joulin (2007), up to a slight modification. 

Definition 2.1. Assume Pt{x, •) € ^d{X) for any t>0 and any x Cz X . The number ad 
given by (2.1) is called the Wasserstein curvature of the process {Xt)t>Q with respect to 
the metric d. 
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Remark 2.2. In the remainder of this paper, we will remove the metric symbol d in the 
definition of the Wasserstein curvature ad when there is no risk of confusion. Moreover, 
we will assume implicitly that the Markov kernel Pt(x, •) belongs to the space ^^di-^) for 
any i > and any x ^ X . 



We define the Wasserstein distance Wd{^J,Tv) between two probability measures & 
^d{X) as 



where the infimum is taken over all 7 g !^d{X x X) with marginals ^ and v. The 
Kantorovich-Rubinstcin duality theorem allows us to rewrite the Wasserstein distance 
as 



see, for instance, Chen (2004), Theorem 5.10. Hence the Wasserstein curvature a is also 
the best (maximal) constant a in the inequality 



Remark 2.3. As noted by Joulin (2007), our definition of Wasserstein curvature of 
Markov processes is inspired by the continuous setting of Brownian motion on Rieman- 
nian manifolds studied by Sturm and Von Renesse (2005), where it is stated that the 
contraction inequality (2.3) characterizes uniform lower bounds on the Ricci curvature 
of the manifold. However, after our paper was published, we learned that a similar no- 
tion of curvature for Markov processes relying on such an inequality had been previously 
introduced in the PhD thesis of Sammer (2005) under the name "Ricci- Wasserstein cur- 
vature", and later independently by Ollivier (2009, 2007b) as the "Ricci curvature" of 
Markov chains on metric spaces. Actually, without the link to geometry, the inequality 
(2.3) appeared first in the work of Dobrushin (1970) with his study on random fields, 
and is known in statistical mechanics as the "Dobrushin uniqueness condition" . More- 
over, such a contraction inequality is fundamental to estimate the spectral gap Ai (say) 
of reversible Markov processes, or equivalently to establish a Poincarc inequality for the 
stationary distribution, since we have Ai > a. See, for instance, Chen (2004), Chapter 9, 
for a summary of and precise references for this topic. 

Actually, the Wasserstein curvature is closely related to the ergodicity of the process, as 
illustrated by the following result. See, for instance, the very general result of Dobrushin 
(1970), Theorem 3, for a proof in the discrete-time case or Chen (2004), Theorem 5.23, 
in the continuous-time setting. 





W^d(Pt(x,-),Pt(2/,-))<c-"*d(x,y) 



x,yeX, t>0. 



(2.3) 
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Theorem 2.4. Assume cr > 0. Then the process (^t)t>o admits a unique stationary 
distribution tt G 0^d{X) and is ergodic in the following sense: For any initial point x £ X , 

Wd{Pt{x,-),TT)<e-"' I d{x,y)Ti{dy) ^ Q. (2.4) 

Remark 2. 5. When d is the trivial metric on X defined by d{x, y) = '^{x^y} , the Wasser- 
stcin distance is nothing but half of the total variation norm. Therefore, the convergence 
in Wasserstein distance generalizes the classical convergence in total variation used in 
the context of general Markov processes. 

Under the ergodic property of the process, the celebrated ergodic theorem states that 
for any (j) £ L^('k), the empirical mean t~^ ipi-^s) ds converges in probability as t goes 
to infinity to the equilibrium 7r(0) := J^i/idTr, where tt denotes the unique stationary 
distribution given by Theorem 2.4. It is well known that the determination of qualitative 
non-asymptotic deviation inequalities is of fundamental importance for simulation algo- 
rithms. However, the theory of large deviations provides a bound for this convergence 
that is only asymptotic in time on the one hand, and whose behaviour in terms of the 
deviation level is not explicit on the other hand. Hence one may wonder if Wasserstein 
curvature plays a crucial role in the determination of such tail estimates relating the 
speed of ergodicity of the process. We give now an affirmative answer to this question by 
stating the main result of the paper, the proof of which is given in the next section. In 
the remainder of the paper, we denote the function 

g{u):={l + u)log{l + u)-u, u>0. (2.5) 



Theorem 2.6. Assume a > and that there exist two positive constants b and V such 
that 



sirp d{Xt-, Xt) <b and 



di;yfQ{;dy) 



X 



< V. 



(2.6) 



Letting (j) £ Lip^(A'), for any initial state x € X, any t > and any y > we have the 
Poisson-type deviation inequality: 



0(X,,)ds-7r(0) 



>y + M^^ < 2c"(^'*/^')5((''^'")/(^'(i"'=""')ll"*"L'Pd», (2.7) 



where tt denotes the unique stationary distribution given in Theorem 2.4 and 

(l-C-*)||,/)||Lip, 



at 



X 



d(x,z)Tr{dz) > 0. 

t — >~\-oo 



Let us give some comments on this result. 



Remark 2. 7. According to a classical large deviation result, the estimate (2.7) is optimal 
in large time since the order of magnitude is e~'^*, and is also sharp in small time. 
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Moreover, the function u i-^ g{u) is equivalent to as u is close to and to u\og{u) as 
u tends to infinity. Hence, for sufficiently large t the inequality (2.7) exhibits a Gaussian 
regime for small values of the deviation level y, in accordance with the central limit 
theorem for Markov processes and a Poisson regime for its large values. 

Remark 2.8. Assume that the process is reversible. As noted in Remark 2.3, the pos- 
itivity of the Wasscrstein curvature ensures the existence of a spectral gap Ai of the 
underlying generator, that is, Ai > cr > 0. Therefore, using the Poincare inequality, the 
asymptotic variance of the empirical mean is bounded by ^^||<?!'|lLip^/^i ^^^^ o^^c deduces 
that the right-hand side of (2.7) is sharp in cr in the Gaussian regime since it behaves as 
e " "^"^'Pd' for large time. 

Remark 2.9. Up to constant factors, we extend the Chernoff inequalities established 
by Lezaud (1998, 2001), because boundedness assumptions are required neither on the 
function (p nor on the generator. Note, however, that if the metric d is such that 
'mix^yd{x,y) > 0, then the finiteness of V implies that the generator is bounded. In 
particular, when d is the trivial metric, we recover Lezaud's results since we have in this 
case Lip^(A') = .SS{X) and V"^ = ||Q(-, A')||oo- Nevertheless, the price to pay in Theorem 
2.6 is to assume cr > 0, which is a stronger assumption in the reversible case than the 
existence of a spectral gap required by Lezaud. 

Remark 2.10. Consider for instance the Langcvin-typc diffusion process solution of the 
following stochastic differential equation dXt = ^/2dBt — S/U{Xt) dt, where {Bt)t>o is a 
standard Brownian motion on the Euclidean space (M" , d) and U is a regular potential 
such that / e~^'^Ma; = 1. Denote by 7r(da;) = e~'^*^^Ma; the stationary distribution of 
the process {Xt)t>o- Since the Wasserstein curvature can be defined in the diffusion 
framework, a step-by-step adaptation of the proof of Theorem 2.6 below ~ especially the 
proof of Lemma 3.1- entails for any Lipschitz function cf) on (K" , d) a Gaussian deviation 
inequality of the form 



1 f* 

- I 0(X,)ds-7r(0) 







>2/ + A/n <2e-*'^''^'/(^(^-^"")' 



provided the Wasserstein curvature of the process {Xt)t>Q is positive. A sufficient con- 
dition ensuring this positivity is given by the Bakry-ICmery curvature criterion, see 
Bakry and Emery (1985), under which the authors established a logarithmic Sobolev 
inequality for the stationary distribution tt. On the other hand, it is classical that such 
a functional inequality entails a similar Gaussian decay to that given above; see, for 
instance, Wu (2000) or the recent article of Guillin et al. (2009). Hence we give under 
comparable assumptions another proof of this Gaussian tail estimate. 

Remark 2.11. As illustrated for birth-death processes in Section 4, it is sufficient to 
carry the analysis in the one-dimensional case since the Wasserstein curvature tensorizes 
on product spaces equipped with the £^-metric. Indeed, for each i = 1, . . . ,iV, consider 
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the Markov process {Xl)t>o with kernel transition Q% stationary distribution tt* and 
Wasserstein curvature (t\ all valued in the same Polish space (3^,p) to simplify. We 
construct the multidimensional Markov process {Xt)t>Q valued in (A", d), where X :— 
and d is the £^-metric defined with respect to p, as follows: choose first a coordinate 
uniformly at random and then let the univariate dynamics run according to this direction. 
Then the stationary distribution tt is given by tt = (S'il^i'''*- l^ow let and v be two 
product probability measures on X . Then the classical tensorization property of the 
Wasserstein distance is given by Wdif^,!^) = X^ili ^p(m'' ^*)' instance Sammer 

(2005), Lemma 2.2.6, for a proof. Hence, the Wasserstein curvature a with respect to the 
metric d of the Markov process {Xt)t>o is computed as cr = mini=i_, ^jy cr^ /N . Moreover, 
if we denote by bi and Vi the numbers in (2.6) related to the coordinate process {Xl)t>o, 
then Theorem 2.6 applies for the multidimensional Markov process {Xt)t>o with a and 
TT as above and with b := maxi=i^...^7v bi and := X^ili ^'i 1^ ■ 

To illustrate our argument, consider the symmetric continuous-time random walk 
(Xt)t>o on the discrete cube {0,1}^, equipped with the Hamming metric d{x,y) = 
Sfc^i ^{xi^vi}- The associated semigroup kernel is given by 



1 ^ 



,ye{0,i} 



N 



and the stationary distribution is the uniform probability measure on {0, 1} , say tt® . 
Since in the one-dimensional case a simple calculation shows that the Wasserstein cur- 
vature with respect to the trivial metric equals 1, the Wasserstein curvature on the 
product space with respect to the Hamming metric is cr = 1/N . Moreover, we have b — 1 
and = 1/2 so that by Theorem 2.6 the following deviation inequality holds for any 
Lipschitz function cj) with respect to the Hamming metric on {0, 1}^: 



q^iXs)ds- 



>y + 



< 2e-(*/2)s(2y/(JV(i- 



-t/N 



)lt01|L.pJ) 



3. Proof of Theorem 2.6 

This section is devoted to the proof of Theorem 2.6, which is rather technical and di- 
vided into several lemmas. First, we give a convenient upper bound in large time on a 
univariate Laplace transform, see Lemma 3.1 below. Using the method of tensorization, 
the extension to the multidimensional case is considered in Lemma 3.2. Finally, with the 
help of the previous lemmas and by a suitable approximation of the empirical mean, we 
finish the proof of Theorem 2.6. 

Let us establish first an upper bound on the Laplace transform of a Lipschitz function 
of the process {Xt)t>[)- The proof, which is a straightforward adaptation of Joulin (2007), 
Theorem 3.1, is given for completeness. 



A new Poisson-type deviation inequality 539 

Lemma 3.1. Under the assumptions of Theorem 2.6, for any f € Lip^(A'), any x € X, 
any t>0 and any t > 0, we have 

E^[gr(/(X0-E.[/(X0])] <exp{/i(r,t,6||/||LipJ}, (3.1) 
where h is the function defined on by 

h{r, t, z) := —h^^ (e^^ - rz - 1). (3.2) 

Proof. Assume first that the Lipschitz function / is bounded. Then the process 
(Z/)o<s<t given by Z/ := Pt-sf{Xs) — Ptf{Xo) is a real-valued P^j-martingale with re- 
spect to the filtration {^s)o<s<t- Using (2.2) and (2.6), we have 

sup \Z{~zf_\^ sup \Pt-JiX,)-Pt-J{X,^)l 

0<s<t Q<s<t 

<b\\f\\up, 

and also 

{Zf,Zf)s = r f (Pt^rfiy) - Pt-rf{Xr^)fQ{Xr-,dy)dT 

Jo Jx 



< 



(l-e-2-*)F2||/||2^p^ 



2a 

By Kallenbcrg (1997), Lemma 23.19, the process given for any r > by 

(exp{rZ/ - 6-2||/||-2^(c-^ll/ll-p. - r&j|/!|Lip^ - 1)(Z/, Z/),})„<,<, 

is a Paj-supermartingale with respect to {^s)o<s<t- Thus, using the two previous esti- 
mates, we get for any r > 0: 

E4e^(/(^*)-E.[/(x,)])]^IE^[erz/] 

< exp|ii^|-P^(e-il/ll-. - rblfWu,, I] 

To remove the boundedness assumption on /, consider the sequence of bounded functions 
fn := max{— n, min{/, n}} converging pointwise to /. Then it is routine to show that 
(/n)nGN IS Uniformly integrable with respect to the probability measure Pt{x,-), which 
implies the L^-convergence. Finally, since the functions /„ are Lipschitz with a constant 
of at most ll/llLipj a-nd h is non-decreasing in its last variable, the use of Fatou's lemma 
achieves the proof. □ 



Our present purpose is to extend to the multidimensional case the Laplace transform 
estimate (3.1) by using the method of tensorization. 
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Given n € N \ {0}, define Lip^ [X"^) as the space of real Lipscliitz functions / on the 
product space , endowed with the scminorm 

'-^fZ dn{.,y) 

where d„ is the ^^-distance on A"" with respect to the metric d, that is, dn{x,y) := 

Lemma 3.2. We assume that the hypothesis of Theorem 2.6 is fulfilled. Define the sam- 
ple X" of the process {Xt)t>o by X" = {Xt^ , . . . , Xt^), —: to < ti < ■ ■ ■ < tn and let 
f € LiPd (A""). Then for any initial state x Cz X and any t > 0, we have the multidimen- 
sional Laplace transform estimate: 

E,K(/(^"'-«^[/(^")l)]<exp|f^^/i(r,ife-ife_i,Sfc611/|lLip,j|, (3.3) 

where the function h is defined in Lemma 3.1 and Sk '.= X^ILfc s"'^''*' ■ 
Proof. Let fn-~f and define for any fc=:l,...,n— 1, the function fk on X'' by 



fk{xi, . . .,Xk) ■= I f{xi, . . . ,Xn)Pt„-t„.iiXn-l,<iXn) ' • • Pt^^+i-t JcCfe , dXfc+i ) 

fk+i{xi,. . .,Xk,Xk+i)Pt^+^^tk{xk,dxk+i). 



X 

We divide the proof of Lemma 3.2 into two parts. 

• Step 1: By a downward recursive argument on k, let us show first that the univariate 
function Xk ^ fk{*,Xk) is Lipschitz with respect to the metric d, with furthermore the 
inequality 

sup ||/fc(a;i,...,a::fc-i,-)||Lip^ <Sfc||/||Lip^^- (3.4) 
xi,...,Xk^iex 

Since s„ = 1, the property (3.4) is trivially true for k = n. 

Assume now that (3.4) is satisfied for some k G {2, . . . , n}. First, letting a;i, . . . , Xk-2,y, z, 
Xk G X , we have: 

\fk{xi, . . .,Xk-2,y,Xk) - fk{xi,.. .,Xk^2,Z,Xk)\ 

f{xi, . . .,Xk-2,y,Xk,Xk+i, . . . ,a;„)Pt„-t„_i(a;„-i,dx„) • ■ ■ Pt^_^^_tk{xk,dxk+i) 

AT"-* 

f{xi, . . .,Xk-2,z,Xk,Xk+i,. . .,x„)Pt„_t„_i(a;„_i,d.T„) • ■ ■ Pt^^^-tkixkAxk+i) 

< ll/l|Lip<i d(?/,z) / Pt^-t^_-^{Xn-l,dXn)---Ptk+i-tkixk,dXk+l) 

= ||/||Lip,„d(y,z), 
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Xi,...,Xk-2,XkeX 
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(3.5) 



Now, let us show that the property (3.4) is satisfied at the step k — 1 with the help 
of (3.5). Let xi, . . . ,Xk-2,y, z G X . Using the contraction property (2.2) in the second 
inequality below, 

|/fc„i(a;i, . . .,Xk-2,y) - fk-i{xi, ■ ..,Xk-2,z)\ 

< I fkixi,...,Xk-2,y,Xk)iPtk-tt-iiy,dxk)-Pti,-tt-iiz,dxk)) 

X 



+ / \fkixi,...,Xk~-2,y,Xk) - fk{xi,...,Xk-2,Z,Xk)\Ptk^tk-i{z,dXk) 

Jx 

< c-"'*'^-*'=-^)||/fc(xi, . . .,xk-2,y, •)llLip,d(y, z) 

+ / Wfkixi, . . . ,Xk-2,■,Xk)\\upJd{y,z)Pt^_t|,_^iz,dxk) 
Jx 

<(sfce-'^(*'=-*'=-) + l)||/||Lip,„d(y,z) 
= Sfc-i||/||Lip^,^d(2/,2), 

where in the last inequality we used assumption (3.4) at the step k together with (3.5). 
Therefore, we obtain the inequality 

I j /fc- 1 (Xi , . . . , , ■ ) 1 1 Lip^ < Sfe- 1 1 1 / 1 1 Lip^^^ , 

and the parameters xi, . . . ,a;fc_2 being arbitrary, the property (3.4) is established at the 
step A; — 1 , hence in full generality. 

• Step 2: Proof of the Laplace transform estimate (3.3). 

As before, let us show by a downward recursive argument on k G {2, . . . , n} the following 
inequality: 



E, [e^«^")] < expi ^ /i(t, t, - , 6s, || /|| Lip,„ ) 



, i—k 



(3.6) 



^'rfk^i{xi,...,Xk_i) 



Pt 



Xk-i 



(a;fc_2,dxfc_i) • ■ ■Pti{x,dxi). 



First let k = n. By the Markov property, we have 
EJe-/(^")l 



3"^"("^'-'"")Pt„-t„_i(x„_i,dx„)---Pt,(x,da;i) 



X" 
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< exp{ /i (r, i „ - i „ „ 1 , & 1 1 / 1 1 Lip^^ ) } 

where we used Lemma 3.1 with the univariate Lipschitz function a;„ i— *■ Xn) together 
with the inequahty (3.4) since the function h is non-decreasing in its last variable. Hence 
(3.6) is established in the case k~n. 

Now assume that (3.6) is satisfied for some /c e {2, . . . , ??}. Using the same reasoning 
as above with the Lipschitz function Xk-i ^ /fe_i(*, Xfc-i) , we obtain 



E4e^-^(^"' ] < exp I ^ /i(T, - U_ 1 , hs, \ \ f \ \ up,^ ) | 

<exp| ^ /i(T,t, -tj_i,5sj||/||Lip^^)| 

X , e"-^'=-^("i--"'=-^)Pt,_,_t,_3(xfc_3,dx-fc_2)---Pu(x,da;i) 



SO that the inequality (3.6) is satisfied at step fc — 1, hence in full generality. Finally, we 
obtain from (3.6) with k ^ 2 the inequality 

[e^^(^")] ^ c'^P I '^(^' ^» - 1 ' 1 1 / 1 1 Lip.„ ) I e^^i ) Pt, (x, dxi ) 

and, using once again the same reasoning as before for the Lipschitz function /i entails 
the desired estimate (3.3). The proof of Lemma 3.2 is complete. □ 

Now we are able to prove Theorem 2.6, with the help of Lemma 3.2. 

Proof of Theorem 2.6. Define the sample X" = {Xt^, ■ ■ ■ ,Xt^), where the sequence 
tk = kt/n, k = 0, . . . ,n, is a regular subdivision of the time interval [0,t]. Since 4> G 
Up^{X), the function / given by f{zi,...,Zn):=n-^J2l^^(j){zk), {zi, . . . , Zn) & X'\ is 
Lipschitz on the product space A"" with respect to the ^^-metric c?„ and its Lipschitz 
seminorm satisfies ||/||Lip^ < ?^^^||'/'||Lip^- Note that the function h defined by (3.2) is 
non-decreasing in its last variable. Hence, since we have 

1 - e-'^* 
k=i,...,n 1-e '^*/" 
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the multidimensional Laplace transform estimate (3.3) of Lemma 3.2 implies the following 
upper bound: 



T>0. 



t_ bii~c-*)mup^ 

n n(l-e-'^*/") 
Therefore, by Chebyshev's inequality, we get for any y > 0: 

P,(/(X")-E,[/(X")]>y) 

< inf e--2'EJe-(/(^")-^^[/(^")])] 

T>0 

<e-("^V(2&''T))(l-o-=^<"/")g(2(,a^(l-o-<"/")/(V^(l-o-^<"/")(l-c— ')110|lLipJ)_ 

Applying also the same reasoning to the function — / yields 

P,(|/(X")-E.[/(X")]|>jy) 

< 2e~("^'/(2''''^))(i-<="''"^")9(2''y'T(i-<="'"''")/(^'(i-<="''"''")(i-°""*)ll'^llL.pJ). 



(3.7) 



Now, using the invariance property of the stationary distribution tt and the contraction 
property (2.2), 



|E,[/(X")]-7r(0)| 



< 



< 



1 " /■ 

-E 



e-"'*/"||0|lLi 



fc=i 



(l-c-^*)|| 



ta 



d{x,y)n{dy) 
d{x,y)TT{dy) 



X 



Hence the inequality (3.7) entails for any y > 0, 



-^0(Xh/„)-^(</)) 



k=l 



where 



>y + Mt \ < 2e- 



y2(i_e-2<Tt/n)(i_e-<Tt)|| 



(3.8) 



To finish the proof, note that since the process {Xt)t>o is cadlag and the func- 
tion <p is Lipschitz, the process {(/){Xt))t>o itself is cadlag so that the Riemann sum 
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SI-=i <^(^fet/n) converges Pa;-a.s. to the empirical mean t ^ j^(j){Xs)As. Therefore, 

E<l>{Xkt/n) - 7r((A) 



using Fatou's lemma and the estimate (3.8), we obtain 

1 f* 



0(X,)ds-7r((/)) 



>?/ + A'/f < Uminfl 



71 — >-\-00 

< lim inf 2e' 

n — >+oo 



>y + M^ 



2^-{V''t/b')g{{bya)/{V'{l-c-'')m^,^^)) 



The proof of Theorem 2.6 is established. 



□ 



4. Application to birth— death processes 

The purpose of this final part is to apply Theorem 2.6 to birth-death processes. To do so, 
we compute the associated Wasserstein curvature with respect to a large class of metrics 
on N. In particular, choosing suitably the metric with respect to the transition rates of 
the generator allows us to consider processes with non-necessarily bounded generators 
such as the classical M/M/oo queueing process. 

Let {Xt)t>o be a birth-death process on the state space A" = N. This is a Markov 
process with a generator given for any function / : N — > R by 

Cf{x) = A,(/(x + 1) - fix)) + v^{f{x - 1) - /(a;)), x G N, 

where the transition rates A and v are positive with = 0, conditions ensuring the 
irreducibility of the process. Letting 

^{\j):—l, ii[x) := , x>l, 

we assume in the sequel that the process is crgodic, that is, it satisfies the following 
properties: 

x>0 y>x y x>0 

Then the stationary distribution of the process is 7r(a;) = iJ.{x)/C, x eN. 

A fundamental example is the M/M/oo queue, also known as the birth-death process 
with immigration, which is an ergodic birth-death process {Xt)t>o with an unbounded 
generator given by 

C.f{x) = \{f{x + 1) - J{x)) + vx{f{x - 1) - f{x)), X e N, 

where the parameters A and v are positive. The associated stationary distribution is 
the Poisson measure on N with parameter ^ := \/v. Denote by ^„,p the binomial 
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distribution with parameters n eN and p S (0,1). Using the Mehler-type convolution 
formula given by Chafai' (2006): 

CiXt\Xa=x)=.^,^,-.. * ^j(i_e-.t), t >0, 

wc obtain by Chcbyshcv's inequality the following estimate, available for any y > 0: 

¥,iXt~E4Xt]>y) < inf e--^E,[e^(^*-«-[^*l)] 

r>0 

< inf e-^^+'^-f^*l(^"-^-i) 

r>0 



[Xi]+y)log 1 + 



E,[Xt 

where in the second inequality we used log(l + m) < u, u > Q. Note that the latter Poisson- 
type deviation inequality is convenient for large time since we recover as t tends to infinity 
the classical tail estimate for a centered Poisson random variable X with intensity ^ : 

¥{X - E[X] >y)< exp(y - (^ + 2/) logf 1 + f 



On the one hand, the M/M/oo queueing process is a discrete approximation of the 
Ornstein-Uhlenbeck process, whose stationary distribution is Gaussian. On the other 
hand. Remark 2.10 states that under the Bakry-Emery curvature criterion, the empirical 
mean of a Langcvin-type process, which generalizes the Ornstein-Uhlenbeck process, 
satisfies a Gaussian deviation inequality. Hence it is natural, by comparison with the 
diffusion framework, to investigate Poisson-type tail estimates for the empirical mean 
of positively curved birth-death processes, since they generalize similarly the M/M/oo 
queueing process. However, if we consider the classical metric on N, we are not able 
to apply Theorem 2.6 to processes with unbounded generators because, in this case, V 
is infinite. Since the Wasserstein curvature strongly depends on the metric, the idea to 
overcome this difficulty is to carry the analysis with a Wasserstein curvature related to 
another metric on N that we choose suitably. 

Definition 4-1 • Given a positive function u on N, define the metric 5 '.'Hy.'H^^.+oo) 
as 

x-l y-l 



6{x,y) 



u_i := 1. 



Let us compute the Wasserstein curvature associated to this metric. To do so, we use 
the notion of coupling operators initiated by Ghen (1986). 



Definition 4. 2. An operator C acting on the space of real-valued functions on is 
said to he a coupling of the generator L if it satisfies the two following properties: 
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(i) Marginality: 

\CMx,y)^Cf2{y); 

(ii) Normality: Ch{x,x) ~ Cg{x). 

Here the two real-valued functions fi and fi on N are regarded as bivariate functions 
on N^, and g is the univariate function g{x) = h{x,x). 

Denote by / the identity operator /(/) ~ f. Following Chen (1986), we introduce the 
classical coupling £ by 

Cf{x,y)^(C<E>I + I<E>C)fix,y)l {x^y} + ■^fi-,-){x)l{x=y}, X,yGN. 
Using the metric 6, we have 

^ ' ' y —XyUy + UyUy^i + X^Ux — i^xUx-i, otherwisc. 



Theorem 4.3. The Wasserstein curvature as with respect to the metric S of the birth- 
death process {Xt)t>o is given by the formula 

as = inf <^ i^x+i +Xx~ Vx-^ - Xx+i^^^^^ \. (4.1) 

xm Ux Ux ) 

Proof. Denote a := mtxem{t^x+i + Xx — Vx — A^+i ^"^^ } and assume first that 175 
and a are not —00. 

Consider on N the increasing Lipschitz function f{x) = X]fc=o '^'■k with Lipschitz semi- 
norm ||/j|Lip^ = 1- We have for any integers x <y and any t> 0: 

Ptfjy) - fjy) Ptfjx) - fix) _ Ptfjy) - PJjx) - Sjx, y) 
t t t 

< S{x,y), 

so that we obtain at the limit t^O: 

XyUy ~ VyUy^l " A^;?/^ + Vx^X-l = ^ f ^ ^ f < " CT^ 5 ( X , y) . 

Therefore, taking y = x + 1 and dividing by Ux entail the inequality as < a. 

On the other hand, we aim at proving that the Wasserstein curvature 175 is bounded 
below by a. To do so, we use the coupling argument derived from the proof of Chen 
(1996), Theorem 1.1. Note that a rewrites as 

. ^ —CS{x,x + 1) 
a = mt ; — , 

xeN S(x,x + l) 
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where C is the classical coupling operator defined above, so that we have 

£S{x,x + l)<-aS{x,x + l), xeN. 
As the following identities hold for any x,y £N such that x <y: 

y-i 

C6{x,y) = ^C6{k,k+l), 

k—x 

<5(x,y) = ^5(fc,fc + l), 

k—x 

we get from the latter inequality and using the symmetry between x and y the inequality 

C5{x,y)<-a5{x,y), a;,yeN, (4.2) 

which ensures the contraction property (2.3), and so the desired estimate as > a. The 
proof is achieved in the finite case. 

Finally, if at least era or a is —oo, we are able to adapt the previous argument to show 
that both are actually infinite. □ 

Remark 4-4- Van Doorn (1985, 1987) proved that the spectral gap Ai, which equals 
the so-called decay parameter in his papers, is actually the suprcmum of the Wasserstein 
curvatures given in Theorem 4.3 over the possible metrics S defined in Definition 4.1. 
Later, such a result has been rediscovered by Chen (1996) with the coupling method 
emphasized in the proof above. 

Once the metric S has been introduced in full generality, let us introduce an assumption 
relating the weight u and the transition rates of the generator of the birth-death process 
{Xt)t>Q- We denote in the sequel a A 6 := min{a, b}. 

Assumption A. There exist two constants K > and C > such that 

inf I A ( inf i^x ] > K and <c( — i — A^l, xeN. 



x>0 J \x>l 



Under Assumption A, we have a control on the metric 5 as follows. The proofs are 
straightforward. 

Lemma 4.5. Under Assumption A, the two inequalities below hold: 
C 

(1) 5{x,y)<-=\x-yl x,?;eN; 

V K 

(2) &n^\x5{x,x + if + Vx5{x,x ~ if < 2C^ . 
xeN 
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Remark 4-6. If at least one of the transition rates of the generator is unbounded, then 
the function u vanishes at infinity so that the two metrics in Lemma 4.5(1) are not bi- 
Lipschitz equivalent. In particular, the identity function /(x) = a; is not Lipschitz on N 
with respect to the metric d. 

Now we are able to state the following tail estimate for the empirical mean of the 
birth-death process {Xt)t>o- 

Corollary 4.7. Assume that the Wasserstein curvature ag given by (4-1) is positive and 
that Assumption A is satisfied. Letting (j) € Lip^(N), for any initial state a; € N, any t>Q 
and any y > 0, we have the following Poisson-type deviation inequality: 

-j\{X,)ds-TT{c^) >y + M,-^ <2e-2^^*s««^^)/(2^^(i-<=""^')"^"-''^», (4.3) 

where '■= <y^^tr^{l — c^'^^*')\\(l)\\i^\-p^'^^^-^5{x^z)'K[z) and g is the function given in 
(2.5). 

Proof. Using Lemma 4.5, we get the result by applying Theorem 2.6 with b = C j^/K 
aiidV^ = 2C^. □ 

Remark 4-8- The Poisson-type deviation inequality (4.3) is comparable to that ob- 
tained recently by Liu and Ma (2009) by using martingale techniques together with the 
so-called Lipschitz spectral gap. We mention, however, that there is a one-to-one corre- 
spondence between this object and the Wasserstein curvature according to the variational 
formulas given by Chen (1996), Theorem 1.1. 

To finish this work, let us return to the case of the M/M/oo queueing process. For the 
sake of simplicity, wc assume in the sequel that the intensity ^ of the process equals 1. 
Choosing '.— {x + 1)~^/^, a: e N, in the definition of the metric 6, a brief computation 
shows that the Wasserstein curvature as equals i^/2, which is half of the exact curvature 
V given by Chafai (2006). Moreover, the transition rates of the generator satisfy As- 
sumption A with C = \fK = ^/v. Hence, Corollary 4.7 entails for any Lipschitz function 
(/) € Lip^(N), any t > 0, any initial state x G N and any y > 0, 



(/.(X,)ds-^i(^) 







Remark 4-9. An inequality such as the one above allows us to consider unbounded 
functions (jj as, for instance, the square root hmction, which is Lipschitz with respect to 
the metric S. However, as noted in Remark 4.6, the price to pay is to require (fi € hipgCM), 
which unfortunately excludes the identity function since the generator is unbounded. 
Hence we conjecture that in the case of the M/M/oo queueing process, the deviation of 
the empirical mean of Lipschitz functions with respect to the classical metric is of the 
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Poisson type. See also the recent work of Guillin et al. (2009) for an approach to this 
problem through transportation-information inequalities. 
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